Microbiome assembly predictably shapes diversity across a range of disturbance frequencies in experimental microcosms

Diversity is often implied to have a positive effect on the functional stability of ecological communities. However, its relationship with stochastic and deterministic assembly mechanisms remains largely unknown, particularly under fluctuating disturbances. Here, we subjected complex bacterial communities in microcosms to different frequencies of alteration in substrate feeding scheme, tracking temporal dynamics in their assembly, structure and function. Activated sludge bioreactors were subjected to six different frequencies of double organic loading, either never (undisturbed), every 8, 6, 4, or 2 days (intermediately disturbed), or every day (press disturbed), and operated in daily cycles for 42 days. Null modeling revealed a stronger role of stochastic assembly at intermediate disturbance frequencies, with a peak in stochasticity that preceded the occurrence of a peak in α-diversity. Communities at extreme ends of the disturbance range had the lowest α-diversity and highest within-treatment similarity in terms of β-diversity, with stronger deterministic assembly. Increased carbon removal and microbial aggregate settleability (general functions) correlated with stronger deterministic processes. In contrast, higher stochasticity correlated with higher nitrogen removal (a specialized function) only during initial successional stages at intermediate disturbance frequencies. We show that changes in assembly processes predictably precede changes in diversity under a gradient of disturbance frequencies, advancing our understanding of the mechanisms behind disturbance–diversity–function relationships.

Supplementary Figure 2 -Community structure and assembly assessed via richness ( 0 D), Hill diversity of 1 st ( 1 D) and 2 nd ( 2 D) order, phylogenetic diversity unweighted (PD) and abundanceweighted (PDW), nearest taxon index unweighted () and abundance-weighted (W), from bacterial ASV data for different frequencies of organic loading disturbance (n = 5). Disturbance frequency levels (L): 0 (undisturbed), 1-4 (intermediately disturbed), 5 (press disturbed). In: sludge inoculum (day 0, n = 4). Each panel represents a sampling day, red diamonds display mean values. Welch's ANOVA P-values adjusted at 5% FDR shown within panels. Note the inverted y-axis for both NTI and NTIW, as values closer to zero indicate a higher relative contribution of stochastic assembly. Shaded in grey is the zone of significant stochastic phylogenetic dispersion, |NTI| < 2 and |NTIW| < 2. The box bounds the interquartile range (IQR) divided by the median, and Tukey-style whiskers extend to a maximum of 1.5 times the IQR beyond the box. Figure 3 -Community structure and assembly assessed via richness ( 0 D), Hill diversity of 1 st ( 1 D) and 2 nd ( 2 D) order, phylogenetic diversity non-weighed (PD) and abundanceweighed (PDW), nearest taxon index unweighted () and abundance-weighted (W), from bacterial ASV data for different frequencies of organic loading disturbance (n = 5). Disturbance frequency values were calculated from the frequency of the high organic loading at each disturbance level. In: sludge inoculum (day 0, n = 4). Each panel represents a sampling day. Blue line represents locally estimated scatterplot smoothing regression (loess). Note the inverted y-axis for both NTI and NTIW, as values closer to zero indicate a higher relative contribution of stochastic assembly. Shaded in grey is the zone of significant stochastic phylogenetic dispersion, |NTI| < 2 and |NTIW| < 2. Figure 4 -Richness ( 0 D), Hill -diversity of 1 st ( 1 D) and 2 nd ( 2 D) order, unweighted (PD) and abundance-weighted phylogenetic diversity (PDW), correlated against unweighted (, upper panels) and abundance-weighted nearest taxon index (W, lower panels), from bacterial ASV data for all frequency levels and time points evaluated in this study (m = 184). Kendall correlation and Pvalues adjusted at 5% FDR are indicated within the panels. Blue line represents locally estimated scatterplot smoothing regression (loess) with confidence interval in dark-grey shading. Note the inverted y-axis for both NTI and NTIW, as values closer to zero indicate a higher relative contribution of stochastic assembly. Shaded in grey is the zone of significant stochastic phylogenetic dispersion, |NTI| < 2 and |NTIW| < 2.

Supplementary
Supplementary Figure 5 -Phylogenetic Mantel correlogram evaluating phylogenetic signal in the sludge bioreactor communities sampled in this study. The plot relates between-ASV niche differences to between-ASV phylogenetic distances across a given phylogenetic distance. Significant correlations (Padj < 0.05; closed symbols) indicate significant phylogenetic signal in ASV ecological niches within the associated phylogenetic distance class. The analysis shows a significant phylogenetic signal but mostly across relatively short phylogenetic distances. Figure 6 -Temporal dynamics of community assembly via -diversity null modelling of phylogenetic turnover across samples. Within-treatment pairwise values of the -nearest taxon index, unweighted (NTI, upper panels) and abundance-weighted (NTIW, lower panels), from bacterial ASV data for different frequencies of organic loading disturbance (n = 10). Disturbance frequency levels (L): 0 (undisturbed), 1-4 (intermediately disturbed), 5 (press disturbed). In: sludge inoculum (day 0, n = 6). Each panel represents a sampling day, red diamonds display mean values. Welch's ANOVA P-values adjusted at 5% FDR shown within panels. Shaded in grey is the zone where stochastic processes significantly dominate, |βNTI| < 2. βNTI values closer to zero indicate a higher relative contribution of stochastic assembly. The box bounds the interquartile range (IQR) divided by the median, and Tukeystyle whiskers extend to a maximum of 1.5 times the IQR beyond the box.

Supplementary Figure 7 -Community structure dynamics for bacterial (A, C) phyla and (B, D)
genera, assessed through 16S rRNA gene metabarcoding. The 10 most abundant phyla and the 20 most abundant genera are shown (rows). Columns show the average percentage read abundance among reactors (n = 5) for (A-B) a given disturbance level across time, and for (C-D) a given day across disturbance levels. Disturbance frequency levels (L): 0 (undisturbed), 1-4 (intermediately disturbed), 5 (press disturbed). In: sludge inoculum (day 0, n = 4). Figure 8 -Effluent values of total Kjeldahl nitrogen (TKN), ammonia (NH4 + -N), nitrite (NO2 --N) and nitrate (NO3 --N) as nitrogen for different frequencies of organic loading disturbance (n = 5). Disturbance frequency levels (L): 0 (undisturbed), 1-4 (intermediately disturbed), 5 (press disturbed). In: sludge inoculum (day 0, n = 4). Each panel represents a sampling day, and red diamonds display mean values. The box bounds the interquartile range (IQR) divided by the median, and Tukey-style whiskers extend to a maximum of 1.5 times the IQR beyond the box. Figure 9 -Community function assessed via influent chemical oxygen demand removal (carbon removal, left panels), sludge volume index (sludge settleability, middle panels), and influent total Kjeldahl nitrogen removal (nitrogen removal, right panels), correlated against richness ( 0 D), Hill -diversity of 1 st ( 1 D) and 2 nd ( 2 D) order, unweighted (PD) and abundance-weighted phylogenetic diversity (PDW), unweighted (, upper panels) and abundance-weighted nearest taxon index (W, lower panels), from bacterial ASV data for all frequency levels and time points evaluated in this study (m = 184). Kendall correlation and P-values adjusted at 5% FDR are indicated within the panels. Blue line represents locally estimated scatterplot smoothing regression (loess) with confidence interval in dark-grey shading. Shaded in grey is the zone of significant stochastic phylogenetic dispersion, |NTI| < 2 and |NTIW| < 2. Note the inverted axis for sludge settleability, as it improves with decreasing SVI values, and for both NTI and NTIW, since values closer to zero indicate a higher relative contribution of stochastic assembly. Figure 10 -Community function assessed via influent chemical oxygen demand removal (carbon removal, left panels), sludge volume index (sludge settleability, middle panels), and influent total Kjeldahl nitrogen removal (nitrogen removal, right panels), correlated against unweighted (, upper panels) and abundance-weighted nearest taxon index (W, lower panels), from bacterial ASV data at initial stages of succession (d0 to d21, m = 94). Kendall correlation and P-values adjusted at 5% FDR are indicated within the panels. Blue line represents locally estimated scatterplot smoothing regression (loess) with confidence interval in dark-grey shading. Shaded in grey is the zone of significant stochastic phylogenetic dispersion, |NTI| < 2 and |NTIW| < 2. Note the inverted axis for sludge settleability, as it improves with decreasing SVI values, and for both NTI and NTIW, since values closer to zero indicate a higher relative contribution of stochastic assembly. Figure 11 -Schematic representation of the experimental design and timeline, including disturbance frequency (left column) and incidence (right column). Red squares indicate double organic loading in the feed for that day. Level numbers ranged from 0 to 5 (0 for no disturbance, 1 to 5 for low to high disturbance frequency). Effluent and sludge sampling was done on the days marked in yellow. Figure 12 -Rarefaction plots for 16S rRNA gene sequencing data. (A) Rarefaction curves for reactors within a given disturbance level across time. (B) Rarefied versus observed number of ASVs. Disturbance frequency levels (L, n = 5): 0 (undisturbed), 1-4 (intermediately disturbed), 5 (press disturbed). In: Sludge inoculum (day 0, n = 4).

Supplementary Table 1.
Multivariate tests on bacterial community structure  across disturbance frequency levels.
 The Bray-Curtis dissimilarity metric was used on square-root transformed ASV relative abundance data. * Factor levels (L): 0 (undisturbed), 1-4 (intermediately disturbed), 5 (press disturbed). † Number of permutations used was 9,999 ‡ Number of replicates per level § Degrees of freedom of the residual ¶ P-values after correction for multiple comparisons at 5% FDR, via the Benjamini-Hochberg's method.

Time (d)
No of levels* n ‡ df § res